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Abstract. Flow-Updating (FU) is a fault-tolerant technique that has 
proved to be efficient in practice for the distributed computation of aggre- 
gate functions in communication networks where individual processors 
do not have access to global information. Previous distributed aggre- 
gation protocols, based on repeated sharing of input values (or mass) 
among processors, sometimes called Mass-Distribution (MD) protocols, 
are not resilient to communication failures (or message loss) because 
such failures yield a loss of mass. 

In this paper, we present a protocol which we call Mass-Distribution 
with Flow-Updating (MDFU). We obtain MDFU by applying FU 
techniques to classic MD. We analyze the convergence time of MDFU 
showing that stochastic message loss produces low overhead. This is the 
first convergence proof of an FU-based algorithm. We evaluate MDFU ex- 
perimentally, comparing it with previous MD and FU protocols, and ver- 
ifying the behavior predicted by the analysis. Finally, given that MDFU 
incurs a fixed deviation proportional to the message-loss rate, we adjust 
the accuracy of MDFU heuristically in a new protocol called MDFU 
with Linear Prediction (MDFU-LP) . The evaluation shows that both 
MDFU and MDFU-LP behave very well in practice, even under high 
rates of message loss and even changing the input values dynamically. 
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1 Introduction 



The distributed computation of algebraic aggregate functions is particularly chal- 
lenging in settings where the processing nodes do not have access to global in- 
formation such as the input size. A good example of such scenario is Sensor 
Networks [T 28 where unreliable sensor nodes are deployed at random and the 
overall number of nodes that actually start up and sense input values may be 
unknown. Under such conditions, well-known techniques for distributing infor- 
mation throughout the network such as Broadcast 21 or Gossiping 11 cannot 



be directly applied, and data collection is only practicable if aggregation is per- 
formed. Even more challenging is that loss of messages between nodes or even 
node crashes are likely in such harsh settings. It has been proved [2] that the 
problem of aggregating values distributedly in networks where processing nodes 
may join and leave arbitrarily is intractable. Hence, arbitrary adversarial mes- 
sage loss also yields the problem intractable, but a weaker adversary, for instance 
a stochastic one as in Dynamic Networks [7], is of interest. In this paper, under a 
stochastic model of message loss, we study communication networks where each 
node holds an input value and the average of those values must be obtained 
by all nodes, none of whom have access to global information of the network, 
not even the total number of nodes n. 

A classic distributed technique for aggregation, sometimes called Mass- 
Distribution (MD) 1 10 , works in rounds. In each round, each node shares 
a fraction of its current average estimation with other nodes, starting from the 
input values 3,5,6, 19,27,29,32,33 . Details differ from paper to paper but a 
common problem is that, in the face of message loss, those protocols either 
do not converge to a correct output or they require some instantaneous failure 
detector mechanism that updates the topology information at each node in each 



round. Recently 17 18 , a heuristic termed Flow- Updating (FU) addressed 



the problem assuming stochastic message loss [18] , and even assuming that input 



values change and nodes may fail 17 . The idea underlying FU is to keep track 
of an aggregate function of all communication for each pair of communicating 
nodes, since the beginning of the protocol, so that a current value at a node 
can be re-computed from scratch in each round. Empirical evaluation has shown 



that FU behaves very well in practice 17 18 , but such protocols have eluded 
analysis until now. 

In this paper, we introduce the concept of FU to MD. First, we present a 
protocol that we call Mass- Distribution with Flow- Updating (MDFU). 
The main difference with MD is that, instead of computing incrementally, the 
average is computed from scratch in each round using the initial input value and 
the accumulated value shared with other nodes so far (which we refer to as either 
mass shared, or flow passed). The main difference with FU is that if messages 
are not lost the algorithm is exactly MD, which facilitates the theoretical analysis 



4 Other algebraic aggregate functions can be computed in the same bounds using an 
average protocol |6[[l9j. 



of the convergence time under failures parameterized by the failure probability 
(or message-loss rate). 

Our results. We first leverage previous work on bounding the mixing time of 



Markov chains 30 to show that, for any < £ < 1, the convergence time 
of MDFU under reliable communication is 2 ln(n/£)/<£(G) 2 , where ^(G) is the 
conductance of the underlying graph characterizing the execution of MDFU on 
the network. Then, we show that, with probability at least 1—1/n, for a message- 
loss rate / < l/ln(2Z\e) 3 , the multiplicative overhead on the convergence time 
produced by message loss is less than 1/(1 — y/f ln(2/le) 3 ), and it is constant 
for / < l/(e(2Z\e) e ), where A is the maximum number of neighbors of any 
node. Also, we show that, with probability at least 1 — 1/n, for any < £ < 1, 
after convergence the expected average estimation at any node is in the interval 
[(1 — £)(1 — f)v, (1 + This is the first convergence proof for an FU-based 
algorithm. 

In MDFU, if some flow is not received, a node computes the current esti- 
mation using the last flow received. Thus, in presence of message loss, nodes do 
not converge to the average and only some parametric bound can be guaran- 
teed as shown. Aiming to improve the accuracy of MDFU, we present a new 
heuristic protocol that we call MDFU with Linear Prediction (MDFU-LP). 
The difference with MDFU is that if some flow is not received a node computes 
the current estimation using an estimation of the flow that should have been 
received. 

We evaluate MDFU and MDFU-LP experimentally and find that the per- 
formance of MDFU is comparable to FU and other competing algorithms under 
reliable communication. In the presence of message loss, the empirical evaluation 
shows that MDFU behaves as predicted in the analysis converging to the average 
with a bias proportional to the message-loss rate. This bias is not present in the 
original FU, which converges to the correct value even under message loss. In 
a third set of evaluations, we observe that MDFU-LP converges to the correct 
value even under high message loss rates, with the same speed as under reliable 
communication. We also test MDFU under changing input values to verify that 
it tolerates dynamic changes in practice, in contrast to classic MD algorithms, 
which need to restart the computation each time values are changed. 

Roadmap. In Section [2] we formally define the model and the problem, and we 
give an overview of related work. Section [3] includes the details of MDFU and its 
analysis, whereas its empirical evaluation is covered in Section [4] In Section 
we present the details of MDFU-LP and its experimental evaluation. Section 
evaluates MDFU in a dynamic setting, where input values change over time. 



2 Preliminaries 



Model. We consider a static connected communication network formed by a 
set V of n processing nodes. We assume that each node has an identifier (ID). 
Any pair of nodes i,j € V such that i may send messages to j without relying 



on other nodes (one hop) are called neighbors. We assume that the IDs are 
assigned so that each node is able to distinguish all its neighbors. The set of 
ordered pairs of neighbors (or, edges) is called E. The network is symmetric, 
meaning that, for any i,j £ V, € E if and only if <E E. The set of 

neighbors of a given node i is denoted as Ni and |iV, | is called the degree of i. 
For each pair of nodes i,j € V, the maximum degree between i,j is denoted as 
Dij = max{|iVj|, |-/Vj|}. The maximum degree throughout the network is denoted 
as A = max ie y \Ni\. Each node i knows N and for each j £ Ni, but does 
not know the size of the whole network n. The time is slotted in rounds and 
each round is divided in two phases. In each round, a node is able to send 
(resp. receive) one message to (resp. from) all its neighbors (communication 
phase) and to perform local computations (computation phase). However, for 
each G E and for each communication phase, a message from i to j is lost 
independently with probability /. This is a crucial difference with previous work 
where, although edge-failures are considered, messages are not lost thanks to the 
availability of some failure detection mechanism. More details are given in the 
previous work section. Nodes are assumed to be reliable, i.e. they do not fail. 

Problem. Each node i holds an input value for 1 < i < n. The aim is for each 
node to compute the average v = X)"=i v i/ n without any global knowledge of the 
network. We focus on the algorithmic cost of such computation, counting only 
the number of rounds that the computation takes after simultaneous startup of 
all nodes, leaving aside medium access issues to other layers. This assumption 



could be removed as in 10 



Previous Work. Previous work on aggregate computations has been partic- 
ularly prolific for the area of Radio Networks, including both theoretical and 
experimental work (§[l2]{i^[^|2^|2§|^|2^|3| . Many of those and other ag- 



gregation techniques exploit global information of the network 10 12 22||23| , or 
are not resilient to message loss 3, 5|fT9]. 



FU is a recent fault-tolerant approach 17fT8 inspired on the concept of flows 



(from graph theory). Like common MD techniques, it is based on the execution 
of an iterative averaging process at all nodes, and all estimates eventually con- 
verge to the system-wide average. MD protocols exchange "mass", which lead 
them to converge to a wrong result in the case of message loss. In contrast, 
FU does not exchange "mass". Instead it performs idempotent flow exchanges 
which provide resilience against message loss. In particular, FU keeps the initial 
input value at each node unchanged (in a sense, always conserving the global 
mass), exchanging and updating flows between neighbors for them to produce a 
new estimate. The estimate is computed at each node from the input values and 
the contribution of the flows. No theoretical bounds on the performance of the 
algorithm were provided. Empirical evaluation shows that FU performs better 
than classic MD algorithms, especially in low-degree networks, and it supports 



high levels of message loss 18 . Moreover, it self-adapts to dynamic changes (i.e. 



nodes leaving/ arriving and input value change) without any restart mechanism 



(like other approaches), and tolerates node crashes 17 



MD protocols for average computations in arbitrary networks based on gos- 
siping (exchange values in pairs) were studied in [3j[l9] . Results in [3] are pre- 
sented for all gossip-based algorithms by characterizing them by a matrix that 
models how the algorithm evolves while sharing values in pairs iteratively. As 
in our results, the time bounds shown are given as a function of the spectral 
decomposition of the graph underlying the computation. The work is focused 
on optimizing distributedly the spectral gap, in order to minimize convergence 
time. The dynamics of the model are motivated by changes in topology induced 
by nodes leaving and joining the network. Those changes may be introduced in 
the probability of establishing communication between any two nodes. However, 
the delivery of messages has to be reliable to ensure mass conservation. An al- 
gorithm called Push-Sum that takes advantage of the broadcast nature of Radio 



Networks (i.e., it is not restricted to gossip) is included in 19 , yielding similar 
bounds. Chen, Pandurangan, and Hu [5] present an MD algorithm that first 
builds a forest over the network, where each root collects the information, and 
then a gossiping algorithm among the roots is used. The authors show a reduc- 
tion on the energy consumption with respect to the uniform gossip algorithm. On 
the other hand, the MD algorithm presented in [6] relies on a different randomly 
chosen local leader in each round to distribute values. The bounds given are also 
parameterized by the eigen-structure of the underlying graph. This result was 
extended more recently [4] to networks with a time-varying connection graph, 
but the protocol requires to update the matrix underlying such graph in each 
round. 



31 



MD protocols have been used also for Distributed Average Consensus 27p9 



34 within Control Theory, but they do not apply to our model. For example, 



33 34 the model includes unreliable communication links, but the algorithm 



requires instantaneous update of the topology information held at each node at 



the beginning of each round. Others, either rely on similar features 27 29 31 



or do not consider changes in topology at all 32 



The common problem in all the MD protocols is that they are not resilient 
to message loss, because it implies a loss of mass. Hence, if messages are lost, 
they need to restart the computation from scratch. In MDFU, message loss has 
an impact on convergence time, which we show to be small, but the compu- 
tation recovers from those losses, yielding the correct value. In fact, it is this 
characteristic of MDFU and FU in general what makes the technique suitable 
for dynamic settings in which the input values change with time. 



3 MDFU 

As in previous work [3||6j[l0j[l9j , MDFU is based on repeatedly sharing among 
neighbors a fraction of the average estimated so far. Unlike in those papers, in 



MDFU the estimation is computed from scratch in each round, as in FU 17 



18 . For that purpose, each node keeps track of the cumulative value passed to 



each neighbor (or, cumulative flow) since the protocol started. Together with 
the original input value, those flows allow each node to recompute the average 



Algorithm 1: MDFU. Pseudocode for node i. ej is the estimate of node 
i- Fin(j) is the cumulative inflow from node j. F out (j) is the cumulative 
outflow to node j. 

II initialization 

1 ei Vi 

2 foreach j G Ni do 

3 F in (j)<-0 

4 Foutij) 4 — ej/ (2-Dy) 

5 foreach round do 

// communication phase 

6 foreach j G iV, do 

7 Send j message (i, F out [j)) 

8 foreach {j, F) received do 

9 F in (j)4-F 

II computation phase 

10 ei <- V t + EjGJV^rcC?) ~ F ut{j)) 

11 foreach j G AT, do 

12 F out (j) <- Foutij) + e *l (2Dij) 



estimation in each round. Should some flow from node i to node j be lost, j 
temporarily computes the estimation using the last flow received from i. Further 
details can be found in Algorithm [l] 

Recall that the aim is to compute the average v — ~Ylii=\ v il n °f an input 
values. Let ei{r) be the average estimate of node i in round r, and e(r) = 
maxi{|e.;(r) — w|/u} be the maximum relative error of the average estimates in 
round r. We want to bound the number of rounds after which the maximum 
relative error is below some parametric value £. 

In each round, a node shares a fraction of its current estimate with each 
neighbor. Therefore, the execution of each round can be characterized by a tran- 
sition matrix, denoted as P = (pij), Vi, j € V, such that for any round r where 
messages are not lost 

(l/{2D lj ) ifi^jand(i,j)e£, 
Pij = { 1 Efeejv, V ( 2D ik) if i = 3, 

and e(r + 1) = e(r)P, where e(-) is the row vector (ei(-)e2(-) . . . e„(-)). 
3.1 Convergence Time for / = 

Consider hrst the case when the communication is reliable, that is / = 0. Then, 
the above characterization is round independent and, given that P is stochastic, 
it can be seen as the transition matrix of a time-homogeneous Markov chain 
(X r )^ 1 with finite state space V. Furthermore, (X r )^L 1 is irreducible, and ape- 
riodic, then it is ergodic and it has a unique stationary distribution. Given that 



P is doubly stochastic such stationary distribution is Tti = 1/n for all i G V. 
Thus, bounding the convergence time of (X r )'^_ 1 we have a bound for the con- 
vergence time of MDFU without message loss. The following notation will be 
useful. Let G be a weighted undirected graph with set of nodes V and where, 
for each pair i,j G V, the edge has weight iiiPij. G is called the underlying 
graph of the Markov chain (X r )'^L 1 . The following quantity characterizes the 
likelihood that the chain does not stay in a subset of the state space with small 
stationary probability. Let the conductance of graph G be 



<£(G) = min 



^i,jes Pip 



The following theorem shows the convergence time of MDFU with reliable com- 
munication parameterized in the conductance of G. 

Theorem 1. For any communication network of n nodes running MDFU, for 
any < £ < 1, and for r c = 2 ln(n/£)/0(G) 2 , if f = 0,. it holds that e{r) < £ 
for any round r > r c , where ^(G) is the conductance of the underlying graph 
characterizing the execution of MDFU on the network. 

Proof. We want to find a value of r c such that for all r > r c it holds that 
maxj{|ei(r) - v\/v} < £. Then, we want maxj{|ei(r)/ J2jev v j ~ l l n W < €/ n - 
Given that ej(r) = J2jev v j(P r )ji' ^ * s enough to have maxj ie v{\(P r )ji — 
l/ n \} < £,l n - O n the other hand, given that PijiTi = Pjiitj for all i, j € V, the 



Markov chain is time-reversible. Then, as proved in 30 , it is maxjjgy |(P r ).y — 
Tfjl/'ftj ^ ^1/ minjgy 7Tj ;, where Ai is the second largest eigenvalue of P (all 
the eigenvalues of P are positive because pu > 1/2 for all i € V). Given that 
7Tj = 1/n for all i G V, we have maxj^-gy' l(P r )ij — l/ n l ^ Thus, from the 
inequality above, it is enough to have < £/n. As proved also in |30| , given 
that (X r ) < ^ 1 is ergodic and time-reversible, it is Ai < 1 — <P(G) 2 /2. Then, it is 
enough (1 - <£(G) 2 /2) r < f/n. Given that <P(G) < 1, using that 1 - x < e~ x for 
x < 1, the claim follows. 



3.2 Convergence Time for / > 

Mixing time of a multiple random walk. Recall that we carry out an average 
computation of n input values where each node i shares a l/(2£)j,) fraction of 
its estimate in each round of the computation with each neighboring node j. We 
have characterized each round of the computation with a transition matrix P so 
that in each round r the vector of estimates e(r) is multiplied by P. 

The Markov chain defined in Section[3j]that models the average computation 
is also a characterization of a random walk, that is, a stochastic process on the 
set of nodes V where a particle moves around the network randomly. In our 
case, for each round, instead of choosing the next node where the particle will 
be located uniformly among neighbors, the matrix of transition probabilities is 
P. A state of this process (which of course is also Markovian) is a distribution of 



the location of the particle over the nodes. The measure of this random walk that 
becomes relevant in our application is the mixing time, that is, the number of 
rounds before such distribution will be close to uniform. The mixing time of this 
random walk is the same as the convergence time of the Markov chain {X r )^L x , 
setting appropriately for each case the desired maximum deviation with respect 
to the stationary distribution as follows. 

A useful representation of this process in our application is to assume a set S 
of particles, all of the same value v, so that at the beginning each node i holds a 
subset Si of particles such that \Si\v — Vi. In order to analyze the computation 
along many rounds, we assume that v is small enough so that particles are not 
divided. We define the mixing time of this multiple random walk as the number 
of rounds before the distribution of all particles is within £/n of the uniform, 
for < £ < 1. Without message loss, it can be seen that the mixing time of the 
above defined multiple random walk is the same as the convergence time of the 



Markov chain (X r )^L 1 defined in Section 3.1 We consider now the case where 
messages may be lost. 

The following lemma shows that, for / < l/ln(2Z\e) 3 , the multiplica- 
tive overhead on the mixing time produced by message loss is less than 
1/(1 - i// ln(2zie) 3 ), and it is constant for / < l/(e(2Ae) e ). The proof uses 
concentration bounds on the delay that any particle may suffer due to message 
loss. 

Lemma 1. Consider any communication network of n nodes running MDFU, 
anyO< f < l/ln(2Z\e) 3 , any < £ < 1, let r c = 2 hi{n/£)/<P{G) 2 , and let 



1/e iff< l/(e(2Ae) e ) 

f (^41n(2Z\e) 3 //-3 - l) /2 otherwise. 



q = 

Consider a multiple random walk modeling MDFU as described. With probability 
at least 1 — 1/n. after r = r c /(l — q) rounds it holds that max xe s,iev \Px(i) — 
1/ ?1 I < where p x {i) is the probability that particle x is located at node i. 

Proof. For clarity, we model the network with a directed graph {V, E}, with V 
and E as defined in the model. A message loss in the edge (i,j) & E is modeled 
with a buffer on the edge where a particle is "delayed". For a computation 
of r rounds, it is enough to consider at most n(2A) r particles, because initially 
there are n input values and each value is divided r times by at most 2A. Consider 
the random walk of a given particle x € S. For each round, x is delayed with 
probability /. We bound the mixing time by bounding the number of rounds 
that any particle is delayed as follows. 

Assume first that l/(e(2Z\e) e ) < / < l/ln(2Z\e) 3 . For r rounds, the ex- 
pected number of rounds when a given particle is delayed is fr. Using Chernoff- 
Hocffding bounds (25 , the probability that a given particle x is delayed more 
than qr rounds, / < q < 1, is at most exp(— fr(q/ f — l) 2 /3). Then, the proba- 
bility that some particle is delayed more than qr rounds is 

/ 2A 

Pr(3x : x delayed > qr) < n f)2/{?f)) 



Assuming that 2Z\exp(l — q) < exp((g — /) 2 /(3/)), we get that 
Pr(3x : x delayed > qr) < n 



2 1n(n/Q 
1 \ (1-<J)*(G) 



exp(l - q) 

= n6XP (~ 2 $(G)ty ' giVCn that ^ " 1 ~ l > 

< nexp(— 2 Inn) 
= 1/n. 

Then, it remains to prove 

2zicxp(l-g) <exp((g-/) 2 /(3/)) 
q 2 + fq + .f-fH2Ae) 3 >0. 



Which is true for q = f yy '4 ln(2Z\e) 3 // — 3 — 1 j /2, which is feasible because, 

for / < 1/ ln(2Z\e) 3 , such value of q implies / < q < 1 . 

Consider now the case < / < l/(e(2Z\e) e ). Again, using Chernoff-Hocffding 
bounds, the probability that a given particle x is delayed more than qr rounds, 
/ < q < 1, is at most ((fe/q) q /e') Then, the probability that some particle is 
delayed more than qr rounds is 



Pr(3x : x delayed > qr) < n 



2 A ffe 



ei \ q 



Assuming that 2A(fe/q) q /e* < l/exp(l — q) we get as before, 

Pr(3x : x delayed > qr) < 1/n. 
Then, it remains to prove 

ef \ q J ~ e Y -i 
2Ae 1 ~f < {q/ff 
2Ae < {q/ff . 

Which is true for / < l/(e(2Ae) e ) and q = 1/e. 

The expected number of particles at each node as a function of /. 

Analyzing a multiple random walk of a set of particles, in Lemma[T]we obtained 
a bound on the time that any particle takes to converge to a stationary uniform 
distribution. However, for any probability of message loss / > and for any 
round, there is a positive probability that some particles are located in the edge 
buffers defined in the proof of such lemma. Hence, the fact that each particle 
is uniformly distributed over nodes does not imply that the expected average 
held at the nodes has converged, because only particles located at nodes are 
uniformly distributed. We bound the expected error in this section. The proof of 
the following lemma is based on computing the overall expected ratio of particles 
in nodes with respect to delayed particles. 



Lemma 2. Consider a multiple random walk modeling MDFU under the con- 
ditions of Lemma [7j Then, with probability at least 1 — 1/n, for any round 
r > r c/(l — q)> the expected number of particles E(|iS- |) in each node i is 
(1 - 0(1 - f)\S\/n < E(|Sf>|) < (1 + 0\S\/n. 

Proof. We consider a multiple random walk of a set of particles S over a directed 
graph V, E, with V and -E as defined in the model. A message loss in the edge 
(i, j) G E is modeled with a buffer on the edge (i, j) where a particle is "delayed" . 

(r) 

The following notation will be useful. For any round r, S x is the set of particles 
held at the set X (node set or edge-buffer set), and S$ is the set of particles 
held at the node i. Let p, t — J2j e N- VO^tf) f° r anv node i. By linearity of 
expectation, at the end of round r, the expected number of particles in all buffer- 
edges and the expected number of particles in all nodes are 



E(i4 r) D = Y J n\st 1) \)fPi + /E(i4 r_1) i) (i) 



iev 
iev 



Using that < 1/2 in[T]and[2j we have 

E(i4 r) i) < (f/2)-E(\st 1) \)+m\s i r 1) \) 
E(i4 r) D > (i - mMis^D + (i - m(\s%-v\). 



Then, 



E(l4 r) l) / (f/imis^D + mst 1 ^) 



< 



E(l4 r) D (1 - f/2M\st 1] \) + (1 - /)E(l4 r_1) D 

. / , 1 - f/2 . 1 

< — , because -^—j- > -. 

Then, given that E(|S*{; ) |)+E(|4 r) I) = we have E(|4 r) |) > (l-/)|S|.As 
proved in Lemmajl] with probability at least 1 — 1/n, for any round r > r c /(l—q), 
max^gg igy \Px(i) ~ l/ n l < £/ n ; where is the probability that particle x is 
located at node i and q as defined in such lemma. Then, for any node i £ V, it 
is (1 - £)(1 - /)|S|/n < E(|5l r) |) < (l + OI^I/n and the claim follows. 

Based on the previous lemmata, the following theorem shows the convergence 
time of MDFU. 

Theorem 2. Consider any communication network of n nodes running MDFU. 
For any < / < l/ln(2Z\e) 3 , let q = 1/e if f < l/(e(2Ae) e ), or q = 

f ^ v /41n(2Z\e) 3 //-3- l) /2 otherwise, and let r c = 21n(n/£)/<£(G) 2 . Then, 

with probability at least 1 — 1/n, for any < £ < 1 and any round r > r c j (1 — q), 



the expected average estimation at any node i € V is (1 - £)(1 - f)v < E(ef ') < 
(1 + £)v, where ^(G) is the conductance of the underlying graph characterizing 
the execution of MDFU on the network. 

Proof. From Lemmas [T] and [2j we know that, under the conditions of this theo- 
rem, for any round r > r c /(l — q) and any node i € V, with probability at least 
1 — 1/n the expected number of particles (of the multiple random walk modeling 

MDFU) is (l-£)(l-/)|5|/n < EQsf ]) < (l + £)\S\/n. Then, multiplying by 
the value of each particle the claim follows. 



4 Empirical Evaluation of MDFU 

We evalutated MDFU in a synchronous network simulator, using an Erdos-Renyi 
[9] network with 1000 nodes and 5000 links ( giving an average degree of 10). The 



input values were chosen as when performing node counting 16 ; i.e., all values 
being except a random node with value 1; this scenario is more demanding, 
leading to slower convergence, than uniformly random input values. The evalua- 
tion aimed at: 1) comparing its convergence speed under no loss with competing 
algorithms; 2) evaluating its behavior under message loss; 3) checking its ability 
to perform continuous estimation over time-varying input values. 



4.1 Convergence Speed Against Related Algorithms Under no 
Faults 

To evaluate wether MDFU is a practical algorithm in terms of convergence speed, 



we compared it against three other algorithms: the original Flow-Updating 17 
[18) (FU), Distributed Random Grouping [6] (DRG), and Push-Synopses [l9j. 
Figure [T] shows the coefficient of variation of the root mean square error as 
a function of the number of rounds (averaging 30 runs), with CV(RMSE) = 

It can be seen that MDFU is competitive, providing approximate estimates 
slightly faster than FU and DRG and giving reasonably accurate results roughly 
in line with them. It loses to them for very high precision estimation and to 
Push-Synopses for all precisions (but both DRG and Push-Synopses are not 
fault-tolerant). 



4.2 Fault Tolerance: Resilience to Message Loss 

To evaluate the resilience of MDFU to message loss, we performed simulations 
using different rates of message loss (0, 1%, 5%, 10%), where each individual 
message may fail to reach the destination with these given probabilities. We 
measured the effect of message loss on both the CV(RMSE) and also on the 
maximum relative error. As can be seen in Figure [2] as long as there is some 
message loss, they do not tend to zero anymore, but converge to a value that is 
a function of the message loss rate. 
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Fig. 1. CV(RMSE) over rounds in a 1000 node 5000 link Erdos-Renyi network. 
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Fig. 2. Coefficient of variation of the RMSE and maximum relative error for MDFU 
in a 1000 node 5000 link Erdos-Renyi network. 



We also measured the behavior of the average of the estimates over the whole 
network, and observed that there is a deviation from the correct value (v, the 
average of the input values) towards lower values. Figure [3] shows the relative 
deviation from the correct value over time, for different message loss rates. It 
can be seen that this bias is roughly proportional to the message loss rate (for 
these small message loss rates). 

Relating these results with the theoretical analysis of MDFU, we can see 
that this bias should not come as a surprise. From Theorem |2j the expected 
value of the estimation converges to a band between (1 — f)v and v. The relative 
deviation of the lower boundary is thus proportinal to the message loss rate. 
Figure [3] also shows this boundary for the different message loss rates. 

This kind of bias was not present in the original FU, in which the average 
of the estimates tends to the correct value. In MDFU the message loss rate 
limits the precision that can be achieved, but it does not impact convergence, 
contrary to classic mass distribution algorithms where, given message loss, the 
more rounds pass, the more mass is lost and the more the estimates deviate from 
the correct value, failing to converge. 
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Fig. 3. Bias on the average estimation over rounds in a 1000 node 5000 link Erdos- 
Renyi network. 



5 MDFU with Linear Prediction 

The explanation for the behavior of MDFU under message loss lies in that only 
the estimate converges, but flows keep steadily increasing over time. This can be 
seen in the formula: F out (j) <— F out (j) + e»/ (2Dy) where the flow sent to some 
neighbor increases at each round by a value depending on the estimate and their 
mutual degrees. What happens is that during convergence, the extra flow that 
each of two nodes send over a link tend to the same value, and the extra outgoing 
flow cancels out the extra incoming flow. We can say that it is the velocity (rate 
of increase) of flows over a link that converge (to some different value for each 
link). 

This means that, even if the estimate had already converged to the correct 
value, given a message loss, the extra flow that should have been received is 
not added to the estimate, implying a discrete deviation from the correct value. 
This discrete deviation does not converge to zero; thus, we have a bias towards 
lower values and the relative estimation error is prevented from converging to 
zero given some message loss rate. 

Here we improve MDFU by exploring velocity convergence. We keep, for 
each link, the velocity (rate of increase) of the flow received. If a message is lost, 
we predict what would have been the flow received, given the stored flow, the 
velocity and the rounds passed since the last message received over that link, 
i.e., we perform a linear prediction of incoming flow. When a message is received 
we update the flow and recalculate the velocity. This algorithm is presented in 
Algorithm [2j 

Under no message loss MDFU-LP is the same as MDFU and the theoreti- 
cal results on convergence speed also apply to MDFU-LP. Under message loss 
the velocities converge over time and the prediction will be increasingly more 
accurate. Therefore, message loss should not cause discrete deviations in the 
estimate, allowing the estimation error to converge to zero. 

We have evaluated MDFU-LP for the same network as before, but now with 
a wide range of message loss rates. We have observed that the behavior under 



Algorithm 2: MDFU-LP. Pseudocode for node i. e$ is the estimate of node 
i- Fin(j) i s the cumulative inflow from node j. F out (j) is the cumulative 
outflow to node j. V(j) is the velocity of incoming flow from node j. R(j) 
is the number of rounds since the last message received from node j. 





// initialization 
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&i Vi 


2 


foreach j G Ni do 


3 


F in (j) <~ 


4 


FoutU) 4- e»/ (2Dij) 


5 


V(i) 


6 


<- 1 


7 


foreach round do 




// communication phase 


8 


foreach j G Ni do 


9 


Send j message {i,F ou t(j)) 




II computation phase 


10 


foreach {j, F) received do 


11 


V(j)<-(F-F in (j))/R(j) 


12 


R(j) 4- 


13 


F in {j)<-F 


14 


ei<-Vi + £ i€ w t (*i»Cj) + x - F out (i)) 


15 


foreach j G Ni do 


16 


F otlt (j) 4- F ou «0') + ei/ (2Aj) 


17 





message loss rates below 50% is almost indistinguishable from the behavior under 
no message loss. Figure [4] shows the CVRMSE and maximum relative error for 
0%, 60%, 70%, and 80% message loss rates. It can be seen that even for 60% 
loss rate, after 60 rounds we have basically the same estimation errors as under 
no message loss. 

6 Continuous Estimation Over Time- Varying Input 
Values 

Up to thus point we have considered that the input values Vi are fixed through- 
out the computation. In most practical situations this will not be the case and 
input values will change along time. The common approach in MD algorithms 
is to periodically reset the algorithm and start a new run that freezes the new 
input values and aggregates the new average. Naturally, resets are inefficient 
and mechanisms that can adapt the ongoing computation have the potential to 
adjust the estimates in a much shorter number of rounds. 

Without any further modifications, MDFU (and MDFU-LP) share with FU 
the capability of adapting to input value changes, since Vi is considered in the 
computation of the local estimate e,, and this regulates how much the outgoing 




Fig. 4. Coefficient of variation of the RMSE and maximum relative error for MDFU-LP 
in a 1000 node 5000 link Erdos-Renyi network. 




Fig. 5. Estimated value over rounds in a 1000 node 5000 link Erdos-Renyi network, 
with changes of the initial input value at 50% of the nodes. 



flows are to be incremented. If Vi decreases, decreases in the same proportion 
and node i will share less through its flows to the neighbours. The converse 
occurring when Vi increases. The overall effect is convergence to the new average, 
even if multiple nodes are having changes in their input values. 

In Figure [5] we show an example of how MDFU handles input value changes. 
In this setting, starting at round 50 and during 50 rounds, we increase by 5% in 
each round the input value in 500 nodes (a random half of the 1000 nodes). In 
the following 50 rounds, the same 500 nodes will have its value decreased by 5% 
per round. Initial input values are chosen uniformly at random (from 25 to 35) 
and the run is made with message loss at 10%. In Figure [5] one can observe that 
individual estimate^] closely follow the global average, with only a slight lag of 
some rounds. 

Notice that the lag could never be zero, since we are updating the new global 
average (black line) instantaneously and even the fastest theoretical algorithm 
would need information that takes diameter rounds to acquire. 



5 To avoid clutering the graph only shows individual estimate evolution for a random 
sample of 100 of the 1000 nodes. 
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